We design and analyze minimax-optimal algorithms for online linearoptimization games where the player's choice is unconstrained. The playerstrives to minimize regret, the difference between his loss and the loss of apost-hoc benchmark strategy. The standard benchmark is the loss of the beststrategy chosen from a bounded comparator set. When the the comparison set andthe adversary's gradients satisfy L_infinity bounds, we give the value of thegame in closed form and prove it approaches sqrt(2T/pi) as T -> infinity. Interesting algorithms result when we consider soft constraints on thecomparator, rather than restricting it to a bounded set. As a warmup, weanalyze the game with a quadratic penalty. The value of this game is exactlyT/2, and this value is achieved by perhaps the simplest online algorithm ofall: unprojected gradient descent with a constant learning rate. We then derive a minimax-optimal algorithm for a much softer penaltyfunction. This algorithm achieves good bounds under the standard notion ofregret for any comparator point, without needing to specify the comparator setin advance. The value of this game converges to sqrt{e} as T ->infinity; wegive a closed-form for the exact value as a function of T. The resultingalgorithm is natural in unconstrained investment or betting scenarios, since itguarantees at worst constant loss, while allowing for exponential rewardagainst an "easy" adversary.
展开▼